%\documentclass[10pt,conference,letterpaper]{IEEEtran}
\documentclass{sig-alternate}
\usepackage{times}
%\usepackage[english]{algorithm2e}
\usepackage{algorithm}
\usepackage{algpseudocode}
%\usepackage[named]{algo}
%\algref{<algorithm>}{<line>}
\newtheorem{theorem}{Theorem}
%\newcounter{Observation}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{Observation}[theorem]{Observation}
\def\candidate{{\cal C}}
\def\comment#1{}
\usepackage{graphicx}
\input{psfig}


\pagestyle{empty}

\usepackage{graphicx}

\special{pdf: pagesize width 8.5in height 11in}

\begin{document}

\conferenceinfo{SIGMOD'11,} {June 12--16, 2011, Athens, Greece.}
\CopyrightYear{2011} \crdata{978-1-4503-0661-4/11/06}
\clubpenalty=10000 \widowpenalty = 10000

% ****************** TITLE ****************************************

\title{Finding Semantics in Time Series}

\numberofauthors{3}

%\author{%
%% author names are typeset in 11pt, which is the default size in the author block
%{Peng Wang{\small $~^{1,2}$}\titlenote{The work was done when the author visited Microsoft Research Asia}\hspace{1cm}Haixun Wang{\small $~^{2}$}\hspace{1cm}Wei Wang{\small $~^{1}$} }%
%% add some space between author names and affils
%\vspace{1.6mm}\\
%\fontsize{10}{10}\selectfont\itshape
%$^{1}$Fudan University, China\\
%%Address Including Country Name\\
%% \fontsize{9}{9}\selectfont\ttfamily\upshape
%% \{$~^{1}$pengwang5,$~^{3}$weiang1\}@fudan.edu.cn\\
%%$~^{3}$third.author@first-third.edu%
%% add some space between email and affil
%%\vspace{1.2mm}\\
%\fontsize{10}{10}\selectfont\rmfamily\itshape
%$^{2}$Microsoft Research Asia\\
%% %Address Including Country Name\\
%% \fontsize{9}{9}\selectfont\ttfamily\upshape
%% $~^{2}$haixunw@microsoft.com
%}

%\numberofauthors{1}
%\author{
%Peng Wang $^{1,2}$\titlenote{This work was done at Microsoft Research Asia}\hspace{.8cm}Haixun Wang $^2$\hspace{.8cm}Wei Wang $^1$ \vspace{.2cm}\\
%\affaddr{$^1$Fudan University, China}\\
%\affaddr{$^2$Microsoft Research Asia, China}\\
%\email{\normalsize{pengwang5@fudan.edu.cn,haixunw@microsoft.com,weiwang1@fudan.edu.cn}}\\
%}

\numberofauthors{3}
\author{
%\alignauthor
Peng Wang $^{1,2}$\titlenote{This work was done at Microsoft Research Asia}\hspace{.8cm}Haixun Wang $^2$\hspace{.8cm}Wei Wang $^1$ \vspace{.2cm}\\
        %\affaddr{~}\\\vspace*{-.2cm}
       \affaddr{$^1$Fudan University, Shanghai, China}\\
       \affaddr{\small \{pengwang5, weiwang1\}@fudan.edu.cn}\\%\vspace*{.1cm}
       \affaddr{$^2$Microsoft Research Asia,  Beijing, China}\\
       \affaddr{\small haixunw@microsoft.com}\\
}


\maketitle

\begin{abstract}
  In order to understand a complex system, we analyze its output or
  its log data. For example, we track a system's resource consumption
  (CPU, memory, message queues of different types, etc) to help avert
  system failures; we examine economic indicators to assess the
  severity of a recession; we monitor a patient's heart rate or EEG
  for disease diagnosis. Time series data is involved in many such
  applications. Much work has been devoted to pattern discovery from
  time series data, but not much has attempted to use the time
  series data to unveil a system's internal dynamics. In this paper, we go beyond
  learning patterns from time series data. We focus on obtaining a
  better understanding of its data generating mechanism, and we regard
  patterns and their temporal relations as organic components of the
  hidden mechanism. Specifically, we propose to model time series data
  using a novel pattern-based hidden Markov model (pHMM), which aims
  at revealing a global picture of the system that generates the time
  series data. We propose an iterative approach to refine pHMMs
  learned from the data. In each iteration, we use the current pHMM to
  guide time series segmentation and clustering, which enables us to
  learn a more accurate pHMM.  Furthermore, we propose three pruning
  strategies to speed up the refinement process. Empirical results on
  real datasets demonstrate the feasibility and effectiveness of the
  proposed approach.
\end{abstract}

\renewcommand{\baselinestretch}{.981} \normalsize
\bibliographystyle{abbrv}


\category{H.2.8}{Database Management}{Database Applications}[Data
Mining]


\terms{Algorithms, Performance}

\keywords{Time Series, Hidden Markov Model, Pattern}
\vfill\eject
\input{introduction2}

\input{preliminary}
\input{initial}
\input{refine}

\input{experiment_sigmod11}
\input{related}

\section{Conclusion}
\label{sec:conclusion}


In this paper, we reveal the dynamics of a complex system by
learning a pattern-based hidden Markov model from the time series
data generated by this system. The biggest difference between a pHMM
and a traditional HMM is that in pHMM, observations are not given,
but learned from the data as well. We propose an approach to learn
patterns (observations) and the model simultaneously. Furthermore,
three pruning strategies are proposed to speed up the learning
process. With pHMM, we are able to perform pattern based tasks, such
as trend prediction and pattern-based correlation detection.
Empirical results on real datasets demonstrate the feasibility and
effectiveness of the
proposed approach. % In our future work, we plan to extend it to
% multiple dimensional datasets, and stream applications.

%{\renewcommand{\baselinestretch}{1.0} \normalsize
%\bibliographystyle{plain}
%%\bibliography{haixun}
%}

\begin{thebibliography}{10}
\renewcommand{\baselinestretch}{1.03} \normalsize

\bibitem{timeseries94}
G.~Box, G.~Jenkins, and G.~Reinsel.
\newblock {\em Time series analysis: forecasting and control}.
\newblock Prentice Hall, 1994.

\bibitem{dwt99}
K.~Chan and W.~Fu.
\newblock Efficient time series matching by wavelets.
\newblock In {\em ICDE}, 1999.

\bibitem{chatfield04series}
C.~Chatfield.
\newblock {\em The analysis of time series: an introduction}.
\newblock Chapman \& Hall/CRC, 2004.

\bibitem{chen2009concept}
S.~Chen, H.~Wang, and S.~Zhou.
\newblock {Concept clustering of evolving data}.
\newblock In {\em ICDE}, 2009.

\bibitem{chen2008stop}
S.~Chen, H.~Wang, S.~Zhou, and P.~Yu.
\newblock {Stop chasing trends: Discovering high order models in evolving
  data}.
\newblock In {\em ICDE}, 2008.

\bibitem{markovseries99}
G.~Dangelmayr, S.~Gadaleta, D.~Hundley, and M.~Kirby.
\newblock Time series prediction by estimating markov probabilities through
  topology preserving maps.
\newblock In {\em SPIE}, 1999.

\bibitem{keogh08}
H.~Ding, G.~Trajcevski, P.~Scheuermann, X.~Wang, and E.~Keogh.
\newblock Querying and mining of time series data: Experimental comparison of
  representations and distance measures.
\newblock In {\em VLDB}, 2008.

\bibitem{dft94}
C.~Faloutsos, M.~Ranganathan, and Y.~Manolopoulos.
\newblock Fast subsequence matching in time-series databases.
\newblock In {\em SIGMOD}, 1994.

\bibitem{xhhx09}
X.~Gu and H.~Wang.
\newblock Online anomaly prediction for robust cluster systems.
\newblock In {\em ICDE}, 2009.

\bibitem{paa00}
E.~Keogh, K.~Chakrabarti, M.~Pazzani, and S.~Mehrotra.
\newblock Dimensionality reduction for fast similarity search in large time
  series databases.
\newblock {\em KAIS}, 2000.

\bibitem{plr98}
E.~Keogh and M.~Pazzani.
\newblock An enhanced representation of time series which allows fast and
  accurate classification, clustering and relevance feedback.
\newblock In {\em SIGKDD}, 1998.

\bibitem{keogh01}
E.~Keogh, S.Chu, D.~Hart, and M.~Pazzani.
\newblock An online algorithm for segmenting time series.
\newblock In {\em ICDM}, 2001.

\bibitem{keogh06}
E.~J. Keogh.
\newblock A decade of progress in indexing and mining large time series
  databases.
\newblock In {\em VLDB}, 2006.

\bibitem{sax07}
J.~Lin, E.~J. Keogh, L.~Wei, and S.~Lonardi.
\newblock Experiencing sax: a novel symbolic representation of time series.
\newblock In {\em DMKD}, 2007.

\bibitem{mari96}
J.~F. Mari, D.~Fohr, and J.~C. Junqira.
\newblock A second-order hmm for high performance word and phoneme-based
  continuous speech recognition.
\newblock In {\em ICASSP}, 1996.

\bibitem{keoghkdd10}
A.~Mueen and E.~Keogh.
\newblock Online discovery and maintenance of time series motifs.
\newblock In {\em SIGKDD}, 2010.

\bibitem{keoghicdm09}
A.~Mueen, E.~Keogh, and N.~Bigdely-Shamlo.
\newblock Finding time series motifs in disk-resident data.
\newblock In {\em ICDM}, 2009.

\bibitem{perng00}
C.~Perng, H.~Wang, S.~R. Zhang, and D.~Parker.
\newblock Landmarks: A new model for similarity-based pattern querying in time
  series databases.
\newblock In {\em ICDE}, 2000.

\bibitem{reeves09}
G.~Reeves, J.~Liu, S.~Nath, and F.~Zhao.
\newblock Managing massive time series streams with multiscale compressed
  trickles.
\newblock In {\em VLDB}, 2009.

\bibitem{ron96}
D.~Ron, Y.~Singer, and N.~Tishby.
\newblock The power of amnesia: Learning probabilistic automata with variable
  memory length.
\newblock {\em Machine Learning}, pages 117--149, 1996.

\bibitem{tan2010adaptive}
Y.~Tan, X.~Gu, and H.~Wang.
\newblock {Adaptive system anomaly prediction for large-scale hosting
  infrastructures}.
\newblock In {\em PODC}, 2010.

\bibitem{tang07}
L.~Tang, B.~Cui, H.~Li, G.~Miao, D.~Yang, and X.~Zhou.
\newblock Effective variation management for pseudo periodical streams.
\newblock In {\em SIGMOD}, 2007.

\bibitem{wang03drift}
H.~Wang, W.~Fan, P.~S. Yu, and J.~Han.
\newblock Mining concept-drifting data streams using ensemble classifiers.
\newblock In {\em SIGKDD}, 2003.

\bibitem{ensembleoverfitting}
H.~Wang, J.~Yin, J.~Pei, P.~S. Yu, and J.~X. Yu.
\newblock Suppressing model overfitting in mining concept-drifting data
  streams.
\newblock In {\em SIGKDD}, 2006.

\bibitem{wang2010algorithmic}
P.~Wang, H.~Wang, M.~Liu, and W.~Wang.
\newblock {An algorithmic approach to event summarization}.
\newblock In {\em SIGMOD}, 2010.

\bibitem{vlhmm06}
Y.~Wang and L.~Zhou.
\newblock Mining complex time-series data by learning the temporal structure
  using bayesian techniques and markovian models.
\newblock In {\em ICDM}, 2006.

\bibitem{infovis99}
J.~J. Wiik and E.~R. Selow.
\newblock Cluster and calendar based visualization of time series data.
\newblock In {\em INFOVIS}, 1999.

\bibitem{datamining05}
I.~H. Witten and E.~Frank.
\newblock {\em Data mining: practical machine learning tools and techniques}.
\newblock Morgan Kaufmann, 2005.

\end{thebibliography}


\end{document}
